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Abstract 

Statistical inference on the mean of a Poisson distribution is a fundamentally 
important problem with modern applications in, e.g., particle physics. The dis- 
creteness of the Poisson distribution makes this problem surprisingly challenging, 
even in the large-sample case. Here we propose a new approach, based on the re- 
cently developed framework of inferential models (IMs). Specifically, we construct 
optimal, or at least approximately optimal, IMs for two important classes of asser- 
tions/hypotheses about the Poisson mean. For point assertions, we develop a novel 
recursive sorting algorithm to construct this optimal IM. Numerical comparisons 
of the proposed method to existing methods are given, for both the mean and the 
more challenging mean-plus-background problem. 

Keywords and phrases: Belief function; constraint; plausibility function; predic- 
tive random set; recursive ordering; score function; validity. 



1 Introduction 



Statistical inference based on discrete data, in particular, Poisson counts, is a funda- 
mentally important and counterintuitively challenging problem. For example, modern 
inference problems in high-energy physics involve Poisson count data, and the combina- 
tion of discreteness, small sample size, and occasional parameter constraints cause trouble 
for classical frequentist methods; see Mandelkern (2002), Brown et al. (2003), and the 



references therein. Bayesian methods, popular in part for their conceptual and computa- 
tional simplicity, also suffer in such problems because, in addition to the uncertain choice 
of prior, the inferential output generally is not calibrated for easy interpretation by users. 



So, these kinds of challenging problems apparently require new ways of handling uncer- 
tainty In this paper, we apply the recently developed framework of inferential models 
(IMs) to this problem of inference on a Poisson mean. 

The primary goal of statistical inference is the conversion of experience, in the form 
of observed data, into scientific knowledge. But in order for a consensus to ultimately 
be reached, it is desirable that the inferential output, i.e., measures of uncertainty about 
the truthfulness of any assertion/hypothesis of interest, be meaningful both within and 
across experiments. 

I. Meaningfulness within an experiment. The inferential output should depend on the 
observed data in a logical and meaningful way. For example, Bayesian posterior 
probabilities or p- values can, in principle, be plotted as functions of observed data, 
and sense can be made out of the relationships revealed in this plot; e.g., a hy- 
pothesis is more plausible for one data value than for another. On the other hand, 
frequentist hypothesis testing procedures, and the conclusions reached by them, are 
justified based Type I and Type II error rates, which are calculated pre-data and, 
therefore, meaningless in the given problem. 

II. Meaningfulness across experiments. Inferential outputs should be suitably cali- 
brated so that, if many similar experiments are conducted at different times or 
places, then the data-dependent measure of support for a true (resp. false) assertion 
should be large (resp. small) for a majority of the experiments, where "large/small" 
and "majority" have mathematical definitions available pre-experiment. The lan- 
guage of frequentist error rates can be used to describe such properties, but it is 
not the frequentist properties themselves that are important, but rather the inter- 
pretability of the inferential results that is derived from them. 

As mentioned above, frequentist methods generally fail to satisfy Property I. In dis- 
crete data problems, such as Poisson, frequentist methods also tend to violate Property II: 
typically large-sample approximations are used, which may not be appropriate in applica- 



tions, and extreme care must be taken even if they are appropriate (Brown et al. 2003). 
Bayesian methods satisfy Property I, but without a carefully chosen reference prior, 
there are no guarantees that Property II can be satisfied. Other methods for probabilis- 



tic inference are available, namely, Fisher's fiducial inference (Fisher 



its variants (Hannig 2009), and Dempster-Shafer theory (Dempster 



1973 



2008 



|Zabell 


1992 


Shafer 


1976 


of Property 



However, to be meaningful, fiducial probabilities must be interpreted subjectively and, 
therefore, do not generally satisfy the calibration in Property II. 



The IM framework of Martin and Liu (2012) was built upon ideas first laid out in 



Martin et al. (2010) and Zhang and Liu (2011). The term "inferential model" reflects the 



understanding that an inferential method satisfying both Properties I and II generally 
requires something more than fiducial's "continue to regard" (Dempster 1963) strategy. 



Martin and Liu (2012) develop a general and relatively simple three-step construction 
of an IM. The details of this construction are reviewed in Section [2J As a result of this 
careful reasoning with uncertainty, the IM framework identifies and corrects the inherent 
bias in Fisher's fiducial inference. Moreover, under very mild conditions, this IM output 
is shown to satisfy both desirable Properties I and II. 
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In this paper we specialize the general IM framework to the important Poisson prob- 
lem, extending the naive analysis of this problem in Martin and Liu (2012) in two direc- 
tions. After a brief introduction to the basic IM construction and theoretical properties 
in Section |2j we present results on optimal IM construction for two important classes 
of assertions/hypotheses about the Poisson mean, namely, one- and two-sided assertions. 
Section [3] establishes a simple result on the optimal IM for one-sided assertions. The more 
challenging class of two-sided assertions is considered in Section |4j There we develop first 
some intuitions about the optimal IM construction, and then propose a novel recursive 
algorithm for construction of an (approximately) optimal IM for two-sided assertions, 
which translates directly to interval estimates for the Poisson mean. Our second contri- 
bution is an extenstion to the problem where non-stochastic constraint information about 
the Poisson mean is available, in addition to the observed data. This constrained Poisson 
mean problem has applications in high-energy physics, where signal counts cannot be di- 
rectly distinguished from background noise. Numerical comparisons in Section 4J3 show 
that the proposed method compares favorably to existing methods in terms of a variety of 
frequentist criteria. However, it is important to keep in mind that IMs are more than just 
a tool to construct frequentist procedures: IMs produce prior-free posterior probabilistic 
inference, exactly what Fisher's fiducial inference was designed to achieve. 



2 Brief review of IMs 



2.1 Definitions and basic construction 



Building on ideas in Martin et al. (2010) and Zhang and Liu (2011), Martin and Liu 



( |2012[ ) presented a general framework of prior-free, posterior probabilistic inference based 
on what are called inferential models (IMs). To fix notation, let X be the observable data, 
taking values in a space X, and let 9 be the parameter of interest, taking values in the 
parameter space G. Given the application we have in mind here, we shall assume 
and X are subsets of R. The starting point of the IM framework is similar to that of 
fiducial, in the sense that an auxiliary variable, denoted by U and taking values in a 
space U with probability measure Pu, is associated with X and 9. It is this association, 
together with the distribution U ~ Pu, which characterizes the sampling distribution 



A ~ Px\e- After observing X = x, the fiducial/Dempster-Shafer approach is to "continue 
to regard" (Dempster 1963) U as a sample from Pu, and then invert the association to 
get a corresponding fiducial posterior distribution for 9, given X = x. 

The IM approach takes a different perspective. That is, instead of keeping the inter- 
pretation of U as a random variable, the IM approach treats the unobserved value u* of 
U, which is tied to the observed data X = x and the true value of 9, as the fundamental 
quantity. Then the goal is to predict this unobserved value u* with a random set. It turns 
out that the success of the IM framework rests on the choice of this predictive random 
set, described in more detail next. 

Start with a collection S = {S t : t £ T} of Pt/-measurable subsets of U, indexed by 
some generic space T. This collection will serve as the support of the predictive random 
set. Martin and Liu (2012) showed that, for optimal predictive random sets, it suffices 
to assume that the collection § is nested in the sense that either S t C S t > or S t > C S t 
for all pairs t, t' e T. We can define now define the predictive random set S, supported 
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on S, with "distribution function" Ps{S C S} = Pjj(S), for S G §, what we call the 
natural measure. Any predictive random set constructed in this way is admissible; the 



name "admissible" is based on the result (Martin and Liu 2012, Theorem 3) that for 



any predictive random set, there is one in this admissible class that is as good or better. 
Therefore, without loss of efficiency, we may restrict attention to predictive random sets 
with nested supports equipped with the natural measure. 



The following three steps, described in Martin and Liu (2012), define an IM: 

A-step. Associate X, 9, and U ~ Pjj in a way consistent with the sampling distribution 
X ~ Px\e such that for all x G X and all u G U, it defines a unique subset Q x (u) C 0, 
possibly empty, containing all possible candidate values of 9 given (x,u). 

P-step. Predict the unobserved value u* of U associated with the observed data by 
an admissible predictive random set S. 

C-step. Combine S and the association Q x (u) specified in the A-step to obtain 



e x (S) = |J e x (u). 



Then compute the belief function 



be\ x (A;S) = P s {e x (S)CA}, 



(2.1; 



(2.2) 



where A C is the assertion/hypothesis about 9 of interest. 

The belief function is just one part of the inferential output. Since the belief function 
be\ x (A;S) is sub-additive, i.e., be\ x (A;S) + be\ x (A c ;S) < 1, one actually needs both 
bel^A; S) and bel x (y4 c ; S) to summarize the information in x concerning the truthfulness 
of assertion A. In some cases, it is more convenient to report the plausibility function 



p\ x (A; S) = P S {Q X (S) HA^0} = 1- be\ x (A c ; S). 



(2.3) 



Often, Monte Carlo methods are required to evaluate the belief /plausibility functions. 
Also note that it is not necessary to have the same predictive random set for each of A 
and A c . In fact, for optimal inference, Martin and Liu (2012) recommend using different 
predictive random sets for each point in 0; see Section I] 



2.2 Validity and optimality 

The performance of a particular predictive random set is measured through the sampling 
behavior of the corresponding belief function, as a function of X 
assertion A. In particular, the IM is said to be valid at A if 



Px\e, at a given 



sup P X | 

6»eA c 



| 9 {bel x (A;S) > I -a) < a, aG(0,l), 



(2.4) 



or, in other words, be\x(A;S) is stochastically no larger than Unif (0, 1) when X ~ P x \e 
with 9 G" A. This validity property is a mathematical description of Property II in 
Section [TJ That is, if A is false, then the amount of support in data X for A will be large 
only for a relatively small proportion of X values. Martin and Liu (2012, Theorem 1) 



4 



show that this validity property is easy to arrange: it holds whenever the predictive 
random set S is admissible in the sense described above. 

As a consequence of the validity theorem, one can use the IM output-belief and 
plausibility functions — to construct frequentist decision procedures. For example, in a 
testing problem, H : 9 G A versus H\ : 9 ^ A, the testing rule 

reject H based on X = x iff p\ x (A;S) < a (2.5) 

controls the frequentist Type I error rate at the nominal a level. One can also construct 
a 100(1 — a)% plausibility region for the unknown parameter by inverting this test, 

U x (a) = {9:p\ x (9;S)>a}. 



This plausibility region also has nominal frequentist coverage probability; see Martin and 



Liu (2012) for details. But we should emphasize here that, although plausibility functions 
can be used to construct frequentist procedures, they can also do much more. Indeed, 
the belief and plausibility functions provide meaningful prior-free posterior probabilistic 
evidence for the truthfulness of the claim u 9 G A." In particular, any 9' G" II x .(a) is a 
relatively implausible value for the true 9 after observing X = x. Confidence/credible 
intervals simply do not have this sharp of an interpretation. 



Herein we focus only on IMs that are valid in the sense of (2.4). In that case, 
be\x{A; S), as a function of X, is (probabilistically) not too large when A is false. To- 
wards optimality, we want be\x{A;S) as large as possible without violating the validity 
condition. For this, a non-trivial upper bound on the belief function will be helpful. 
Given A, define a class of subsets of U indexed by x G X: 

U X (A) = {u G U : x (u) C A}. (2.6) 

In words, \J X (A) contains all those u such that, given x, the corresponding 9 values all 
agree with the assertion A. It can be shown that Pc/jU^A)} is the fiducial/Dempster- 
Shafer posterior probability for A, given data x. This fiducial probability can also be 
written as an IM belief function, i.e., 

Pu{V x (A)} = be\ x (A; S ), where S = {U}, U ~ P^. (2.7) 



Martin and Liu ( [2012 , Proposition 1) show that, for any admissible predictive random 



set iS, be\ x (A;S) is bounded above by Pf/jU^A)} for all x. If it happens that {U^A) : 
x G X} is nested, then an admissible predictive random set S* exists such that the upper 
bound is attained, i.e., be\ x (A;S*) = be\ x (A;S ) for all x. In this case, we say that the 
IM corresponding to S* is optimal. We summarize this result as follows. 



Proposition 1. Given an assertion A, suppose that {U X (A) : x G X} defined in (2.6) 
forms a nested collection of sets. Then there exists an admissible predictive random set 
S* such that be\ x (A;S*) = be\ x (A;S ) for all x. 

Proof. Take the index set T = X and define the support § = {U^^l) : x G X}. This 
collection is nested by hypothesis. Take S* to be the predictive random set determined 
by the natural measure as in (??). Then S* is admissible. Furthermore, 

be\ x (A;S*) = P^e.OS*) C A} = P S *{S* C U X (A)} = P V {V X (A)}. 

Since the right-hand side equals bel^A; So), the claim follows. □ 
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Proposition [I] resolves this issue of optimal IMs in problems where {U x (y4) : x £ X}, 
is nested; see Section |3j However, in other cases, like in Section [I], these sets are not 



nested so further considerations are needed. Martin and Liu (2012) develop an theory 
of optimal IMs for continuous data models, and steps towards optimality in the discrete 
Poisson data problem are discussed in Section |4} 

3 Poisson inference for one-sided assertions 
3.1 A simple Poisson association 

For the Poisson model, X ~ Pois(0), the probability mass function is fo{x) = e~ e 8 x /x\, 
x = 0, 1, 2, . . ., and the distribution function Fg(x) satisfies 

F e {x) = l-G x+l {9), x = 0,1,2,..., 9 > 0, 

where G a is the gamma distribution function with scale parameter a and rate parameter 



unity. Following Martin and Liu (2012), we introduce U ~ Pu = Unif (0, 1), and define 



the association between data X, parameter 9, and auxiliary variable U as 

F e (X-l) <1-U <F (X), U ~ Unif(0,l). (3.1) 

It is clear that this association characterizes the posited Poisson sampling model; this 
is the familiar recipe for simulation from the Poisson distribution. Using the connection 



between the Poisson and gamma distribution functions, we can rewrite (3.1), for generic 
(x,9,u), as G x+ i{9) <u< G x (9), and, by inversion, we have 

e x (u)=[G-\u),G-l l (u)), (3.2) 

the set of all candidate ^'s, given (x, u). 

3.2 Optimal IMs 

Let 9q > be an arbitrary but fixed value, and consider the assertion A = (9q, oo). This 
assertion is "one-sided" in the same sense that the alternative hypothesis Hi : 9 > 9 in 



the classical testing context is one-sided. In this case, using (3.2), the sets V X (A) defined 



in (2.6) are given by 

U X {A) = {u : G7 x \u) >9 } = {u:u> G x (9 )} = {u : u > 1 - F eo {x - 1)}. 

Since F 9o (-) is a non-decreasing function, it follows that V X (A) C U X /(A) for non-negative 
integers x < x' . Since these sets are nested, there is an optimal IM that can be obtained 
as in the proof of Proposition [l] This optimal IM has belief function 

bel s (A; S\) = Pu{V x (A)} = F do (x - 1), P v = Unif(0, 1). 

Here we use the notation S\ to denote the predictive random set corresponding to the 
optimal IM for the assertion A = (6q, oo). 
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Now consider A c = (0, 9q] , the alternate one-sided assertion. Calculations similar to 
those displayed above shows that V X (A C ) = {u : u < 1 — Fg (x)} = (0, 1 — Fg (x)]. Since 
these again are nested, the optimal IM for A c has belief function 

be\ x (A c ;S%) = Pv{U x (A c )} = 1 - F 6o (x). 

To summarize, for the one-sided assertion A = (9q, oo), an optimal IM exists and can 
be found via Proposition [TJ Specifically, for a given X — x, the corresponding optimal 
belief and plausibility function pair is given by 

{be\ x (A), p\ x (A)} = {Fe (x - 1), F 6o (x)}. 

Some connections between the IM results and classical hypothesis testing are worth men- 
tioning here. First, observe that the plausibility function is exactly Fisher's p-value for 
testing the null hypothesis H : 9 G A. That is, the p-value can be interpreted as an up- 
per bound on the belief probability that the null hypothesis is true. Second, as described 
in Martin and Liu (2012), an IM-based frequentist testing rule would reject Hq : 9 G A 
based on observed X = x if the plausibility function pl x (^4) is too small, i.e., if pl x (^4) < a. 
They show that such a testing rule controls the frequentist Type I error at level a. But, 
in addition, if we ignore randomization issues, then this same rule with pl a .(A) = Fg (x), 
corresponds to the Neyman-Pearson most powerful test. 



4 Poisson inference for two-sided assertions 

Consider a singleton assertion A = {9q} for some fixed 9q > 0. This corresponds to 
a point null hypothesis Hq : 9 = 9$ like in the classical setting. It is well known that 
point nulls and, hence, singleton assertions are closely tied to the important problem of 
constructing confidence/plausibility intervals. In this section we will focus our attention 
on the complement A c = {9q} c , a so-called "two-sided" assertion. 
For this two-sided assertion, the sets ^({^o} ) are 

U,(M C ) = {u : G- x Uu) < 9,} U {u : G~ l {u) > 9 } 
= {u:u< G x+1 (9 )} U{u:u> G x (9 )} 
= (0,1) \(G X+1 (9 ),G X (9 )}. (4.1) 

It is clear from the latter expression that Ua;({(9o} c ) are not nested. Therefore, Proposi- 
tion [T] does not help to identify an optimal IM — something more is needed. 

4.1 Nesting predictive random sets via intersections 

Following the intuition developed in Proposition [TJ we see that the use of the sets 
{U x ({#o} c ) : x G X} is desirable. But in order for the corresponding belief function 
to be valid, these sets need to be modified to make them nested. One way this can be ac- 
complished is by iteratively taking intersections, i.e., order the sets {^J Xk ({^o} c ) '■ k > 1} 
and define S\ = U Xl ({6 l o} c ), £2 = U X2 ({6'o} c ) \ S±, and so on. The following two-step 
procedure describes this idea in more detail. 
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1. Choose a ranking p on X, i.e., an ordering of {U x ({9o} c ) '■ % £ X}. 

2. Let T = {1, 2, . . .} and define S p = {5*f : t G T} as follows. Set So = and 

S?= H U -(W C )= |J (G X+1 (9 ),G X (9 )}, t = 1,2,..., 

x:p(x)>t x:p(x)<t 



where the last equality follows from (4.1) 



For each p, the collection § p is nested, so if it is equipped with the natural measure 
(??), then we obtain an admissible predictive random set S p . Since S p p ,s x is the largest 
of the S^s that is contained in V x ({9q} c ), it follows that 

be\ x ({6 y ] S p ) = P u {S p p{x) _ 1 }= iGx>(6o)-G x , +1 (6 )]= £ f eo (x'), 

x / :p(x')<p( X ) x':p{x')<p{x) 

and, consequently, the corresponding plausibility function is 
pl.(0„;S p ) = pl.({«o};5 p ) = 1 - 

X ' :p( X ')<p(x) 

It follows from the general theory that the IM based on S p is valid for any ranking p. 



Following Martin and Liu (2012), the optimal p is such that be\x({9o} c ;S p ) is largest 
(probabilistically) under X ~ Pois(0), 9 ^ 9 . 



4.2 Optimal ordering: some intuition 

Towards an optimal ordering, we consider the distribution of be\x({9o} c ] S p ) as a function 
of X ~ P xle = Pois(^), for 9 ^ 9 . Consider the event {be\ x ({9 } c ;S p ) < be\ x ({9 } c ; S p )}, 
for a given x G X. Then the Px|0-prcbability of this event is like the distribution function 
of bel x ({# } c ;<S p ),i.e., 

= Px\ e {be\ x ({9 } c ;S p ) < be\ x ({9 } c ;S p )} = fe{ - x '^ ^ 

x':p{x')<p{x) 

which we treat as a function of 9 for each fixed x; the dependence on the ranking p will 
be implicit in the notation. For optimality, we want the belief function to be as large as 



possible without breaking the validity requirement. So we follow Martin and Liu (2012) 
and impose on p the condition that 



ipx{9) is maximized at 9 = 9q for each x. 



By (4.3), the derivative of ifj x (9) with respect to 9 vanishes at 6q, i.e., 



T eo (x')f eQ (x') = 0, VxGX, 

x':p{x')<p(x) 



(4.3) 



(4.4) 



where Tg(x) = (d/d9) log fg(x) = x/9 — 1 is the score function. Recall that, in many cases, 
including the Poisson example considered here, the score function has zero expectation. 



Therefore, we refer to (4.4) as the score-balance condition — that is, in order to satisfy 



S 



(4.4 ), the ranking p must be suitably symmetric, or balanced, with respect to the sampling 
distribution of Tg (X) under X ~ Pois(6> ). 



By (4.3), the second derivative of 4>x(9) with respect to 9, at 9 = 9q satisfies 



Ve Q {x')f eo (x')<0, VxgX, 

x' :p(x')<p(x) 



(4.5) 



where Vg (x) = Tg Q (x) 2 + {d / d9)Tg{x)\ e=e ^. Consequently, the ranking p must be chosen 
so that (4.5) holds in addition to ( |4.4 ). Following a remark about notation, we give some 
intuition for how this can be accomplished. 

In what follows, for ease of interpretation, we report the algorithm and numerical 
results with the current parametrization of 9, the mean of the Poisson distribution. How- 
ever, it is more convenient theoretically to work with the natural parameter in the ex- 
ponential family representation. So by working first with parameter 77 = log^, i.e., 
differentiating with respect to rj, and then substituting 9 = e v , we have 



T eo (x)=x-9 and V 0o (x) = (x - 9 ) 2 - 9 . 



(4.6) 



These expressions are different from what is obtained by working with 9 throughout. 



In order to achieve (4.5), the basic idea is to choose p such that x values with small 
values of |Te (a;)| = \x — 9q\ are assigned higher rank. This is based on the fact that 
Voo( x ) — ( x ~ ^o) 2 — 9 is a quadratic in Tg (x), and so Vg (x) is smallest for x with small 
absolute score. The problem is not this simple, unfortunately, because this intuition 
fails to account for the multiplication by the probability mass function in (4.5). Due 



to the discreteness, an optimal ranking p* satisfying both (4.4) and (4.5) does not exist 
in general. But the formal algorithm described in the following subsection recursively 
defines a permutation that approximately achieves this optimal ordering. 



4.3 Optimal ordering: a recursive scheme 

Here we construct an increasing sequence {E r : r > 0} of subsets of X, with Eq = 0. 
From these, the (approximately) optimal ranking p* is obtained as p*{E r \ i? r _i) = r - 

Recall that, here, we are working with the abused notation described above. That is, 
we start out with the Poisson distribution indexed by the natural parameter 1], the log 
of the mean, and then substitute 9 = e v back into the expressions for the score function, 
etc. Define two subsets of X: 

X + = {x e X : Tg (x) > 0} and X~ = {x <E X : T 6o (x) < 0}. 

These sets with non-negative and negative scores will be updated iteratively in the algo- 
rithm that follows. The basic idea is to choose E r , containing elements of both X + and X~, 
in such a way that (4.4) and (4.5) hold, at least approximately. Algorithm [T] gives the de- 



tails. R code to implement this procedure is available at www.math.uic.edu/~rgmartin 
Line 22 stops the algorithm if both proxies — z/ r (l) and u r (2) — for the left-hand side of 



z/ r (l) and z/ r (2) 

(4.5) are positive. In our experience, no such error will occur. 



As we described previously, it is intuitively clear that the recursive ordering scheme 
in Algorithm [l] will determine an ordering p = p* such that (4.4) and (4.5) approximately 
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Algorithm 1 — Recursive ordering. 



Given tolerance e > 0, take finite X E cX such that Px|6> {X e } > 1 — e. 
initialize X+ = X+ n X e , X = X - n X £ , E = 0, r = 1; 
while r < #(X £ ) do 
if X+_i = then 

£V = -Er-i U {maxX^J; 
X T 7 = X~_ x \ {maxX^J; 
else if X~_ : = then 
£ r = £ r _i U {minX+_ 1 }; 
X+ = X+_ 1 \{minX+_ 1 }; 
else 

£ r (l) = E r _! U {minX+_ 1 }; 

E r (2) = E r _x U {maxX^J; 

for = 1, 2 do 

r r (k) = (d/d6) \ogJ2 X £ Er (k) fe( x )\ 0= 

Vr{k) = J2 x&Er {k) Ve (x)fe ( x ); 
end for 

if |t>(1) | < |r r (2)| and u r {l) < then 
E r = E r (l); 

X+ = X+ 1 \{minX+ 1 }; 
else if u r (2) < then 
E r = E r (2); 

X~ = X~_ x \ {maxX^j}; 
else stop 
end if 
end if 
r = r + 1; 
end while 



flo' 



hold. Here we do a numerical check to confirm this claim. Let 

T(r)=Y,TMfM and V(r)=J2Ve (x)f 6o (x), (4.7) 

x£E r x£E r 

where E r is constructed as in AlgorithmJTJ and T do (x) and Vg Q (x) are as in (4.6). If (4.4) 
and (4.5) hold, then we expect T(r) to be close to and V{r) to be negative, respectively, 
for all r. Figure [I] plots T(r) and V(r) as functions of r, and, indeed, our expectations 
are mostly realized. At first look, the fluctuations in T(r) seem a bit troubling, but it 
turns out that these are effectively dampened by the magnitude of V(r). To see this, 
let ip r (6) be the ip x (9) in (4.2) such that p(x) = r. A two-term Taylor approximation of 
i/y(0) at 9 = 8q can be written as 

-T(r) 6-0o 



M0)-M0o) = V(r)(9-9 Q ) 



o 9-9, 



IV (r) 2 

So if V(r) < and T{r) /V{r) is close to zero, respectively, for each r, then the difference 
should be negative and, hence, ip r (9) is maximized at 9 = 9q for each r. From Figure [l] 
it is clear that T(r)/V(r) has smaller fluctuations than T(r). 
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4.4 Numerical illustrations — mean only 

Here we study the plausibility function p\ x (9 ;S p ) = 1 — bel x ({^ } c ; S p ) based on the 



optimal ranking p = p* in Section 4.3 The belief function at {9q} is zero for all 9q so we 
can safely ignore it. We will compare the plausibility function behavior to that of two 
classical textbook methods for testing H : 9 = 9 versus Hi : 9 ^ 9 . 

1. Normal approximation. A naive approximation is to assume X ~ N(9,9). Then 
the textbook size-a normal test rejects Hq based on observed X = x iff pi(x; 9q) = 



2 - 2$(# 1/2 \x - 9 \) < a. In light of (Q, we take p x (x; 9 ) as the "plausibility 



function" corresponding to this normal test procedure. 

Poisson equal-tail approximation. A somewhat less-naive size-a test rejects Hq 
based on observed X — x iff Fg Q (x) < a/2 or 1 — Fg (x — 1) < a/2. Equivalently, 
this test rejects H iff P2(x;9 ) = 2min{F eo (x), 1 — Fg (x — 1)} < a. We take 
P2(%] &o) as the "plausibility function" corresponding to this test procedure. 



Figure [2] shows the distribution functions of pi(X;6 ), p 2 (X;6o) } and p\ x (6o]S p ), all 
treated as functions of the random variable X ~ Pois(0), for a variety of 9 values, with 
9q = 7. There are two things to look for in these plots. The first, for 9 = 6q, is that the 
distribution function does not exceed the diagonal line corresponding to the distribution 
function of Unif (0, 1). This demonstrates the validity property. In Panel (c) we find 
that only the IM-based plausibility function satisfies the validity criterion. The second 
thing we are looking for is stochastic dominance. Specifically, if one distribution function 
is uniformly smaller than another distribution function, then the former corresponding 
plausibility function is stochastically larger than the latter. This, in turn, means that 
inference based on the former will, in general, be more efficient. Panels (a) and (b) 
show no clear dominance, but the IM tends to outperform the normal approximation. 
Panels (d)-(f ) show that the IM-based plausibility function dominates, stochastically, the 
other two and, hence, the corresponding inference is more efficient. 

Figure [3] plots the "plausibility functions" pi(x; 9) and p 2 {x] 9), based on the frequen- 
tist methods, along with the optimal IM plausibility function, as functions of 9 for various 
x values. One general observation is that both the IM and the normal plausibility func- 
tions peak at 9 = x, the maximum likelihood estimate, shown by a vertical line, while 
the Poisson equal-tail plausibility function is off-center. The horizontal line describes the 
a = 0.1 level sets, i.e., the 90% plausibility intervals. In each case, the normal plausi- 
bility interval — which corresponds exactly to the textbook confidence interval — is a hair 
shorter than the IM plausibility interval. However, unlike the IM plausibility interval, 
which has coverage guarantees via the validity theorem (see Panel (c) of Figure |2|, the 
normal confidence interval has no such guarantees in this sort of mis-specified model. 

4.5 Numerical illustrations — mean plus background 

As shown above, the discreteness of the Poisson random variable makes it challenging to 
develop an efficient IM for its mean, 9. An additional challenge arises when one considers 
an a priori constraint on the possible values of 9. An IM for the constrained Poisson mean 



was developed in Ermini Leaf and Liu (2012). Here, we briefly review the problem and 
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then introduce a more efficient IM using the scheme in Section 4.3 Several frequentist 



methods have also been developed for this problem; see Mandelkern (2002). 



Suppose for example that the Poisson count, X, is comprised of a number of signal 
events, S, and independent background events, B, so that X — S + B. If 5* ~ Pois(A) 
and B ~ Pois(/3), then X ~ Pois(A + (3). Now suppose that the value of (3 has been 
established with certainty. If 9 = A + (3 is the mean of X, then the fact that A must 
be nonnegative implies the constraint, i.e., 9 > (3. The problem with ignoring such a 
constraint is clear in Figure [3] — pl^. ((9) can be positive for any 9 value, even for 9 < (3. So, 
in light of the constraint 9 > f3, the IM must be modified appropriately. 

Technically, the problem can be seen in the auxiliary variable, U. After observing x, 
the constraint implies that U must lie in a strict subset of U. Applying the constraint, 



9 G [/3, oo), to (3.1), leads to a constraint on U: G x+ i{(3) < u < 1. Without considering 
constraints, S is intended to predict U realizations anywhere in U. Some members of its 
support § may not be contained in (G x+ i((3), 1]; these are conflict cases. Let S' be the 
largest S G § such that 5* fl (G x+ i(f3), 1] = 0. The probability on S' and all its subsets 
is known as conflict mass: Pu{S'} = be\ x ([(3, oo) c ; S). An IM for the constrained 9 must 
distribute this conflict mass somewhere in the constraint set. 



The elastic belief method (Ermini Leaf and Liu 2012) expands conflict cases so that 



each one intersects with the constraint. In effect, the conflict mass is moved to a sub- 
set of the parameter constraint set. The proof of validity for the elastic belief method 
also applies to more general procedures. Therefore, it is not necessary to formulate the 
mathematical details of the elastic belief method in this problem. We can simply place 
any conflict mass on which is on the boundary of the constraint. The resulting 

plausibility function for point assertions is: 



pU{0o};#; 



pU{0o};S p 



if 9 < (3- 

if # = /3and bel x .([/3,oo) c ;5 p ) > 0; 
otherwise, 



where S' is the predictive random set implied by moving conflict cases to {/?}, and S p 



is the predictive random set constructed recursively in Section |4.3| In the comparisons 
that follow, we refer to this as the EB-SB method, for elastic belief + score-balance. The 
90% EB-SB plausibility interval, when (3 = 15, is shown as black lines in Figure |4j The 
gray lines correspond to the plausibility intervals in Ermini Leaf and Liu (2012). EB-SB 
produces a shorter interval at each x in the figure. 

For further comparison, we consider a variety of existing methods: confidence in- 



tervals of Feldman and Cousins (1998, FC98), Giunti (1999), Mandelkern and Schultz 



Q2000b[ MS00b),|Roe and Woodroofe| pOOOj RW00) , |Roe and Woodroofe] ( |1999| ) with the 
Mandelkern and Schultz (2000a) adjustment (RW+MS00a), and the plausibility interval 
of Ermini Leaf and Liu (2012, ELL12). Figure [5] shows the coverage probabilities for each 
interval estimate of A, for (3 = 3, as a function of A G [0, 4]. EB-SB seems to be the best 
performer in the left-hand column but, in the right-hand column, there is no clear winner. 
Figure [6] plots the width of the nominal 90% interval estimates, as a function of data x, 
with (3 = 3, for the various methods described above. Here we see that the EB-SB plau- 
sibility interval is the narrowest up to x = 5 at which point it becomes slightly wider than 
the intervals of other methods. But we must reiterate: IMs are more than just tools to 
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construct frequentist procedures. That said, it is remarkable that the EB-SB plausibility 
intervals are as good or better than its competitors based on frequentist criteria. 

5 Discussion 

Inference on a Poisson mean is an important and challenging problem, arising both clas- 
sically and in modern applications. Here we have developed a new theoretical and com- 
putational approach for optimal inference in this problem. The main contribution is our 
construction of an (approximately) optimal predictive random set via a novel recursive 
ordering algorithm. We also developed the EB-SB method to handle the more challenging 
problem of inference about a a Poisson mean when non-stochastic constraint information 
is available, which may be useful to high-energy physicists working on applications in 
this area. Also, the techniques described herein are, for the most part, not special to 
the Poisson problem. So, other challenging discrete data problems (e.g., binomial) can 
be handled similarly, and we expect that the corresponding optimal IM will outperform 
existing methods there as well. 

Numerical results focused primarily on comparing various methods in terms of fre- 
quentist performance. But we want to reiterate once more that IMs, and the belief 
and plausibility functions derived from them, are more than just tools for developing 
frequentist procedures. Indeed, IMs can be used to produce prior-free posterior proba- 
bilistic summaries of evidence in observed data for and against any assertion about the 
parameter of interest. Moreover, this inferential output is meaningful both within and 
across experiments in the sense described in Section [TJ It is especially important that 
these claims hold even for singleton assertions/point null hypotheses, problems of extreme 
scientific importance for which existing approaches, in general, cannot give satisfactory 
probabilistic assessments of uncertainty. 

From a philosophical point of view, the IM framework, in general, helps tie together 
a number of elusive topics. First, it identifies and corrects the inherent selection bias in 
Fisher's fiducial probabilities. Roughly speaking, the fiducial probability for an assertion 
involves a P [/-probability calculation on a data-dependent event in U, and these prob- 
abilities tend to be too large for validity to hold. By choosing an admissible predictive 
random set, the corresponding belief probability is shrunk down enough for validity to 
be achieved, thereby correcting the fiducial bias. Second, by making an optimal choice of 
IM, the corresponding plausibility function at A can be shown to equal Fisher's p-value 
for Hq : 9 G A. There is well-documented difficulty in interpretation of p- values, i.e., 
they are not bona fide probabilities for the truthfulness of H because they require condi- 
tioning on 9 G A, etc. However, it can be shown that there exists a meaningful IM with 
the Fisher p-value equal to the easy-to-interpret plausibility for the truth of the claim 
"9 G A" — no conditioning on the truthfulness of the claim is needed. 
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Figure 1: Numerical checks that Algorithm [I] produces a ranking p such that (4.4) and 
(4.5) approximately hold. Here r is the index in Algorithm [l] and T(r) and V(r) are 
defined in (4.7). The top row is for 8q = 5 and the second row for 6$ = 10; the same 
vertical axis scale is used in both rows. 
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Figure 2: Plots of the distribution function (CDF) of p\ x (9 ), when X ~ Pois(0), for 
6o = 7 and various 0's. In each panel, the two gray lines correspond to the two "frequentist 
plausibility functions" described in the text; the black line corresponds to the optimal 
IM plausibility function. Each is based on 100,000 Monte Carlo samples. 
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Figure 3: Plots of pl a ,(6'), as a function of 0, for various x values. In each panel, solid and 
dashed gray lines are "plausibility functions" pi(x;6) and p2(x;9), respectively, and the 
the solid black line is the optimal IM plausibility function. 
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Figure 4: 90% plausibility intervals for 9 with (3 = 15. The black and gray lines are the 



intervals based on EB-SB and the method in Ermini Leaf and Liu (2012), respectively. 
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Figure 5: Coverage probabilities comparisons for the nominal 90% EB-SB plausibility 
intervals (black) against various confidence intervals (gray) for A G [0,4], with = 3. 
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Figure 6: Width of the various nominal 90% plausibility /confidence intervals for A, with 
(3 = 3, as a function of data x: EB-SB (black); all others, except ELL12, (gray). 
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